Terrorism has been a constant hindrance on mankind’s effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socio-economic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.
As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets: Global Terrorism Database (GTD), which contains information on over 180,000 global terrorist attacks from 1970 to 2017; World, Region, Country GDP/GDP per capita, which includes the GDP per Capita of different countries from 1960 to 2021; and the World Bank National Accounts data, which provides the information on the fertility rate and net migration of each country from 1955 to 2020.
I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let’s roll up our sleeves and demystify the data from the world of global terrorism.
The animation in Figure 1 shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has put over the past 50 years in tackling terrorism in almost every terrorist-prone country. So before anything else, let’s analyze the states in the US that have the highest number of such incidents.
Figure 2 shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way the state with the highest score is assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington, and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansas are the safest states in terms of the frequency of terrorist attacks.
It might be interesting to see the motives behind the terrorist attacks in the US. So, let’s explore them, and to be more specific, let’s compare the motives of the terrorist attacks from 1970-1999 and 2000-2017.
Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades. However, the word-clouds also differ in significant ways. The first word-cloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second word-cloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics suggest that there has been an increase in religiously motivated attacks over the past 20 years. This shift in topics also reflects changes in the political landscape both domestically and internationally from fighting against the spread of communism and racism to religiously motivated terrorism.
Now, let’s analyze how the frequency of terrorist attacks has changed over the last 50 years.
Code
yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=400)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 4: Frequency of Terrorist Attacks
It is apparent from Figure 4 that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last decade.
But, what parts of the world have experienced the highest number of terrorist attacks?
Figure 5 shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.
Let’s delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.
Figure 6: Countries with the Highest Number of Attacks
In Figure 6, we see that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries.
The analysis of global terrorism is incomplete without information on terrorist groups. So, let’s see the top 15 most notorious terrorist groups based on the number of casualties from the attacks they have orchestrated.
Code
groupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]notorious_groups =list(groupwise_casualty_freq["Terrorist Group"])notorious_groups.remove("Unknown")df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])fig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 7: Attacks by Different Terrorist Groups
One cannot fail to notice the peak in 2001 for Al Qaida, which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups like Taliban, Al-Shabaab, and Boko Haram. Taliban, Boko Haram, and ISIL, as evident from the steep lines after 2010 in Figure 7, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.
So what exactly do these terrorist groups target? Let’s find out.
Code
TOP_N =11target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]target_freq = target_freq[target_freq['Target Type'] !="Unknown"]fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 8: Common Targets of Terrorist attacks
Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked. This might be one possible explanation for such a high number of attacks on them.
Now, let’s analyze the relationship between terrorism and socio-economic factors like GDP and fertility rate.
Figure 9: Socio-economic Aspects of Terrorist-prone Countries
Figure 9 shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. All these countries had a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception to have its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception to have its fertility rate below the global average right from the 1980s.
Finally, let’s take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. We believe such a model will be useful for intelligence groups to assess the severity of attacks and prepare for them in the future.
The dataset was split into train, validation, and test sets in the ratio 70:15:15. The train and validation sets were used during the training phase and the test set was for testing the efficiency of the model based on the time it take and its root-mean-squared(RMS) error. The results are shown in Figure 10.
Code
result_df = pd.read_csv("../results/results.csv")result_df = result_df.sort_values(by=['Root Mean Squared Error'])matplotlib.rc_file_defaults()ax1 = sns.set_style(style=None, rc=None)fig, ax1 = plt.subplots(figsize=(12,6))colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)ax1.set_xlabel("Models", fontsize=14)ax1.set_ylabel("Root Mean Squared Error", fontsize=14)ax1.set_title("Efficiency of Models", fontsize=16)ax2 = ax1.twinx()ax2.set_ylabel("Time (in seconds)", fontsize=14)dl = mpatches.Patch(color="#5D3FD3")ml = mpatches.Patch(color="#0096FF")custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")for index, lh inenumerate(leg.legendHandles): if index >0: lh.set_alpha(0.5)sns.lineplot(data =list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')plt.show()
Figure 10: Efficiency of Models
Feed Forward Neural Network turned out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees was the fastest model, completing prediction in 0.99 seconds. In general, neural networks had lower RMS error than other machine learning models but they were also slower to train and test than their machine learning counterparts.
Our analysis ends here for now but in the future, the models will be tuned for their hyperparameters and trained on a larger dataset, combining and feature engineering different socio-economic factors to achieve the lowest possible RMSE score.
More Animations
Code
df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_countries_pivot = df_countries_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Country')df_countries_pivot.fillna(0, inplace=True)df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)df_countries_pivot = df_countries_pivot.sort_index()df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_countries_pivot, n_bars =10, period_length=1000, sort='desc', title="Countries with the Highest Number of Terrorist Attacks", filter_column_colors=True, filename =None)df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =12, period_length=1000, sort='desc', title="Terrorist Attacks Based on Geographical Regions", filter_column_colors=True, filename =None)df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =10, period_length=750, sort='desc', title="Terrorist Attacks based on Geographical Region", filter_column_colors=True, filename =None)
References
Countries in the world by population (2023). Worldometer. Retrieved February 5,
2023, from https://www.worldometers.info/world-population/population-by-country/
Information on more than 200,000 terrorist attacks. Global Terrorism Database.
Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/
Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February
5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022
Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle.
Retrieved February 5, 2023, from
https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021
National Consortium for the Study of Terrorism and Responses to Terrorism. Global
terrorism database. Kaggle. Retrieved February 5, 2023, from
https://www.kaggle.com/datasets/START-UMD/gtd
World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from
https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
Source Code
---title: "The State of Global Terrorism"subtitle: "An In-Depth Analysis of Trends and Threats"author: "Shreehar Joshi"bibliography: references.bibnumber-sections: falseformat: html: theme: - cosmo rendering: embed-resources code-fold: true code-tools: true pdf: defaultjupyter: python3---Terrorism has been a constant hindrance on mankind's effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socio-economic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets: [Global Terrorism Database (GTD)](https://www.kaggle.com/datasets/START-UMD/gtd), which contains information on over 180,000 global terrorist attacks from 1970 to 2017; [World, Region, Country GDP/GDP per capita](https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021), which includes the GDP per Capita of different countries from 1960 to 2021; and the [World Bank National Accounts data](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD), which provides the information on the fertility rate and net migration of each country from 1955 to 2020. I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let's roll up our sleeves and demystify the data from the world of global terrorism.```{python echo=FALSE}#| label: fig-import#| fig-cap: "Global Terrorist Attacks"import pandas as pdimport numpy as npimport plotly.express as pximport nltkfrom sklearn.metrics import mean_squared_errorfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.tree import DecisionTreeRegressorfrom sklearn import neighborsimport tensorflow as tffrom PIL import Imagefrom tensorflow.keras.models import Sequentialfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNNfrom tensorflow.keras.layers import Bidirectional, GRU, UpSampling1Dimport plotly.express as pxfrom sklearn.preprocessing import LabelEncoderfrom wordcloud import WordCloudimport matplotlib.pyplot as pltimport matplotlibimport matplotlib.pyplot as pltimport seaborn as snsfrom matplotlib.lines import Line2Dimport matplotlib.patches as mpatchesimport timeimport warningsimport bar_chart_race as bcrwarnings.filterwarnings("ignore", category=FutureWarning)df_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)df_attacks.head()df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 'gname', 'motive', 'weaptype1_txt', 'nkill']]df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", "country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", "longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type","targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", "weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)df_population = pd.read_csv("../data/population.csv")df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)df_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)df_us_population = pd.read_csv("../data/us_population.csv")df_us_population = df_us_population[["state", "pop2022"]]df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) fig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region", projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.44)fig.show()```The animation in @fig-import shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has putover the past 50 years in tackling terrorism in almost every terrorist-prone country. So before anything else, let's analyzethe states in the US that have the highest number of such incidents.```{python}#| label: fig-us#| fig-cap: "Terrorist Attacks in the US"us_states = np.asarray(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA','HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA','MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY','NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX','UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'])us_state_to_abbrev = {"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR","California": "CA","Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA","Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS","Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA","Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT","Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM","New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH","Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI","South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX","Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA","West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY","District of Columbia": "DC","American Samoa": "AS","Guam": "GU","Northern Mariana Islands": "MP","Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",}df_attacks_us = df_attacks[df_attacks["Country"] =="United States"] df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())df_attacks_us = df_attacks_us.reset_index()df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)df_attacks_us = df_attacks_us[df_attacks_us["State"] !="Unknown"]df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):if minVal ==float('-inf'): minVal =min(df[column])if maxVal ==float('inf'): maxVal =max(df[column]) res = []for val in df[column]: res.append((val - minVal) / (maxVal - minVal))return resdf_us_population.head()df_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempValdf_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)fig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)', color_continuous_scale="Viridis", locationmode="USA-states", scope="usa", labels={'Number of Terrorist Attacks (Standardised)':'No. of Terrorist Attacks'}, title="Terrorist Attacks in the US (1970-2017)")fig.update_layout(title_x=0.44)fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})fig.update_layout(height=500, width=780)fig.show()```@fig-us shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way the state with the highest scoreis assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington,and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansasare the safest states in terms of the frequency of terrorist attacks. It might be interesting to see the motives behind the terrorist attacks in the US. So, let's explore them, and to be more specific, let's compare the motives of the terrorist attacks from 1970-1999 and 2000-2017. ```{python}#| label: fig-motives#| layout-ncol: 2#| fig-cap: "Attack Motives in the US"#| fig-subcap: #| - "1970-1999"#| - "2000-2017"stpwrd = nltk.corpus.stopwords.words('english')extended_list = ["specific", "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]stpwrd.extend(extended_list)df_attacks_us = df_attacks[df_attacks["Country"] =="United States"]df_attacks_us = df_attacks_us[["Year", "Motive"]]df_attacks_us = df_attacks_us.dropna()temp_df = df_attacks_us[(df_attacks_us["Year"] >=1970) & (df_attacks_us["Year"] < (2000))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)wordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "green", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(1970) +" - "+str(1999) +")", fontdict={'fontsize': 36})plt.show()stpwrd = nltk.corpus.stopwords.words('english')stpwrd.extend(extended_list)df_attacks_us = df_attacks[df_attacks["Country"] =="United States"]df_attacks_us = df_attacks_us[["Year", "Motive"]]df_attacks_us = df_attacks_us.dropna()temp_df = df_attacks_us[(df_attacks_us["Year"] >=2000) & (df_attacks_us["Year"] <= (2017))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)wordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "purple", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(2000) +" - "+str(2017) +")", fontdict={'fontsize': 36})plt.show()```Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades. However, the word-clouds also differ in significant ways. The first word-cloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second word-cloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics suggest that there has been an increase in religiously motivated attacks over the past 20 years. This shift in topics also reflects changes in the political landscape both domestically and internationally from fighting against the spread of communism and racism to religiously motivated terrorism.Now, let's analyze how the frequency of terrorist attacks has changed over the last 50 years. ```{python}#| label: fig-frequency#| fig-cap: "Frequency of Terrorist Attacks"yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=400)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```It is apparent from @fig-frequency that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last decade. But, what parts of the world have experienced the highest number of terrorist attacks? ```{python}#| label: fig-regions#| fig-cap: "Terrorist Attacks in different Regions"region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```@fig-regions shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.Let's delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.```{python}#| label: fig-countries#| fig-cap: "Countries with the Highest Number of Attacks"df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])df_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)df_iso_codes.drop_duplicates(inplace=True)df_iso_codes = df_iso_codes.reset_index()df_iso_codes.drop(["index"], axis=1, inplace=True)df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])fig = px.choropleth(df_countries_terrorist_count, locations="Country Code", color="Number of Terrorist Attacks", hover_name="Country", color_continuous_scale=px.colors.sequential.Plasma, title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.44)fig.update_layout(height=500, width=880)fig.show()```In @fig-countries, we see that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries. The analysis of global terrorism is incomplete without information on terrorist groups. So, let's see the top 15 most notorious terrorist groups based on the number of casualties from the attacks they have orchestrated.```{python}#| label: fig-groups#| fig-cap: "Attacks by Different Terrorist Groups"groupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]notorious_groups =list(groupwise_casualty_freq["Terrorist Group"])notorious_groups.remove("Unknown")df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])fig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```One cannot fail to notice the peak in 2001 for Al Qaida, which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups like Taliban, Al-Shabaab, and Boko Haram. Taliban, Boko Haram, and ISIL, as evident from the steep lines after 2010 in @fig-groups, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.So what exactly do these terrorist groups target? Let's find out.```{python}#| label: fig-targets#| fig-cap: "Common Targets of Terrorist attacks"TOP_N =11target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]target_freq = target_freq[target_freq['Target Type'] !="Unknown"]fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked. This might be one possible explanation for such a high number of attacks on them. Now, let's analyze the relationship between terrorism and socio-economic factors like GDP and fertility rate.```{python}#| label: fig-socioeconomic#| layout-nrow: 2#| fig-cap: "Socio-economic Aspects of Terrorist-prone Countries"#| fig-subcap: #| - "GDP"#| - "Fertility Rate"def map_region(country): region =list(df_attacks[df_attacks["Country"] == country]["Region"])[0]return regioncountry_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]country_freq["Region"] = country_freq["Country"].apply(map_region)top_five_countries =list(country_freq["Country"].values)[:5]country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_gdp[((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_all_gdp.dropna()df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)colorList =list(px.colors.qualitative.T10)if colorList[0] !="black": colorList.insert(0, "black")for country in top_five_countries: temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country] df_all_gdp[country] =list(temp_gdp["GDP (in USD)"])fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "GDP (in USD)","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()df_all_fertility = df_population[(df_population["Year"] >=1970) & (df_population["Year"] <=2017)]df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >=1970) & (df_population["Year"] <=2017))]df_all_fertility = df_all_fertility.dropna()df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)for country in top_five_countries: temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country] df_all_fertility[country] =list(temp_fertility["Fertility Rate"])fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "Fertility Rate","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()```@fig-socioeconomic shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. All these countries had a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception to have its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception to have its fertility rate below the global average right from the 1980s.```{python}#| eval: falsetry:del df_attacks["Event ID"]del df_attacks["Motive"]del df_attacks["Latitude"]del df_attacks["Longitude"]except:print("Some of the columns are not present")df_attacks = df_attacks.dropna()df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)y = df_attacks["Casualties"]X = df_attacks.drop(['Casualties'], axis=1)# Split the data into train (70%), validation (15%), and test (15%) setsX_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)scaler = RobustScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.fit_transform(X_test)X_val = scaler.fit_transform(X_val)def create_bilstm(): model = Sequential() model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True))) model.add(Dropout(0.2)) model.add(Bidirectional(LSTM(64, activation='relu'))) model.add(Dropout(0.2)) model.add(Dense(32, activation='relu')) model.add(Dense(1))return modeldef create_ffnn(): model = Sequential() model.add(Dense(128, activation='relu', input_shape=(12,))) model.add(Dropout(0.3)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(32, activation='sigmoid')) model.add(Dense(16, activation='tanh')) model.add(Dense(1))return modeldef create_cnn(): model = Sequential() model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1))) model.add(MaxPooling1D(2)) model.add(Conv1D(64, 3, activation='relu')) model.add(MaxPooling1D(2)) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(1))return modeldef create_gru(): model = Sequential() model.add(GRU(64, activation='tanh', input_shape=(12,1))) model.add(Dropout(0.2)) model.add(Dense(32, activation='tanh')) model.add(Dropout(0.2)) model.add(Dense(1, activation='linear'))return modelresult = []dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)for name, model in dlModels.items(): start_time = time.time() model.compile(optimizer='adam', loss='mse')if name =="Bi-LSTM": model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val)) y_pred = model.predict(X_test_new)else: model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val)) y_pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}for name, model in mlModels.items(): start_time = time.time() model.fit(X_train, y_train) pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])pd.options.display.float_format ='{:.2f}'.formatresult_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])result_df.to_csv("./results.csv") ```Finally, let's take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. We believe such a model will be useful for intelligence groups to assess the severity of attacks and prepare for them in the future.The dataset was split into train, validation, and test sets in the ratio 70:15:15. The train and validation sets were used during the training phase and the test set was for testing the efficiency of the model based on the time it take and its root-mean-squared(RMS) error. The results are shown in @fig-results.```{python}#| label: fig-results#| fig-cap: "Efficiency of Models"result_df = pd.read_csv("../results/results.csv")result_df = result_df.sort_values(by=['Root Mean Squared Error'])matplotlib.rc_file_defaults()ax1 = sns.set_style(style=None, rc=None)fig, ax1 = plt.subplots(figsize=(12,6))colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)ax1.set_xlabel("Models", fontsize=14)ax1.set_ylabel("Root Mean Squared Error", fontsize=14)ax1.set_title("Efficiency of Models", fontsize=16)ax2 = ax1.twinx()ax2.set_ylabel("Time (in seconds)", fontsize=14)dl = mpatches.Patch(color="#5D3FD3")ml = mpatches.Patch(color="#0096FF")custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")for index, lh inenumerate(leg.legendHandles): if index >0: lh.set_alpha(0.5)sns.lineplot(data =list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')plt.show()```Feed Forward Neural Network turned out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees was the fastest model, completing prediction in 0.99 seconds. In general, neural networks had lower RMS error than other machine learning models but they were also slower to train and test than their machine learning counterparts. Our analysis ends here for now but in the future, the models will be tuned for their hyperparameters and trained on a larger dataset, combining and feature engineering different socio-economic factors to achieve the lowest possible RMSE score. ## More Animations```{python}#| eval: falsedf_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_countries_pivot = df_countries_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Country')df_countries_pivot.fillna(0, inplace=True)df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)df_countries_pivot = df_countries_pivot.sort_index()df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_countries_pivot, n_bars =10, period_length=1000, sort='desc', title="Countries with the Highest Number of Terrorist Attacks", filter_column_colors=True, filename =None)df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =12, period_length=1000, sort='desc', title="Terrorist Attacks Based on Geographical Regions", filter_column_colors=True, filename =None)df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =10, period_length=750, sort='desc', title="Terrorist Attacks based on Geographical Region", filter_column_colors=True, filename =None)```<iframedata-external="1"src="https://www.youtube.com/embed/qmtzgzcPTbk"></iframe><iframedata-external="1"src="https://www.youtube.com/embed/Fs7OdDxY3sg"></iframe><iframedata-external="1"src="https://www.youtube.com/embed/jvOC-eA_Hzo"></iframe>## References| Countries in the world by population (2023). Worldometer. Retrieved February 5, | 2023, from https://www.worldometers.info/world-population/population-by-country/| Information on more than 200,000 terrorist attacks. Global Terrorism Database. | Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/ | Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February | 5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022 | Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle. | Retrieved February 5, 2023, from | https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021 | National Consortium for the Study of Terrorism and Responses to Terrorism. Global | terrorism database. Kaggle. Retrieved February 5, 2023, from | https://www.kaggle.com/datasets/START-UMD/gtd | World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from | https://data.worldbank.org/indicator/NY.GDP.MKTP.CD